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Abstract 

We investigate the frequentist posterior contraction rate of nonparametric Bayesian 
procedures in linear inverse problems in both the mildly and severely ill-posed cases. A 
theorem is proved in a general Hilbert space setting under approximation-theoretic as- 
sumptions on the prior. The result is applied to non-conjugate priors, notably sieve and 
wavelet series priors, as well as in the conjugate setting. In the mildly ill-posed setting 
minimax optimal rates are obtained, with sieve priors being rate adaptive over Sobolev 
classes. In the severely ill-posed setting, oversmoothing the prior yields minimax rates. 
Previously established results in the conjugate setting are obtained using this method. 
Examples of applications include deconvolution, recovering the initial condition in the 
heat equation and the Radon transform. 
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1 Introduction 
1.1 Outline 

In this paper, we consider the problem of using Bayesian methods to estimate an unknown 
parameter / from an observation Y generated from the model 



y = y( n ) = Af + -4=2. (i.i) 



Here we assume that / is an element of a separable Hilbert space Hi, A : Hi — > H2 is 
a known, injective, continuous linear operator into another Hilbert space H2 and Z is a 
Gaussian white noise. Many specific examples of regression fall under this general framework, 
such as deconvolution, recovery of the initial condition of the heat equation and the Radon 
transform (see Section [1.41 for details). 

In the Bayesian framework, we treat the unknown element / as a random variable and 
assign to it a prior distribution n, defined on a a- algebra B of (a subset of) the parameter 
space Hi. We then condition on the observed data Y to update this distribution to obtain 
the posterior distribution II(-|Y) and so obtain a sequence of data-driven random probability 
distributions. The Bayesian then draws his inference about / based entirely on the posterior 
distribution. Recently, much focus has been given to the development of nonparametric 
procedures, where the support of n is infinite-dimensional. 
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We wish to study the asymptotic behaviour of the posterior distribution under the frequen- 
tist assumption that the data Y is generated from the model for some true parameter 
/o- We shall measure this behaviour by considering if and at what rate the posterior contracts 
to the true /o as n — > oo as defined in [10]. This question has been the object of much study 
in recent years (see e.g. [TU1 [HI 13 I2H [27] for some examples), but the situation of inverse 
problems has only recently been considered and then only in the conjugate setting [Hll6|IT7]. 
We shall use a novel approach to study possibly non-conjugate priors; we also recover some 
of the results from [TBI [TTJ . 

Our method of proof follows the testing approach introduced in |10| and thus does not 
rely on explicit computation of the posterior. A key ingredient to using this approach is the 
construction of suitable tests for the problem 

H :f = f H A :fe{f:\\f- / || Hl > U (1.2) 

with exponentially decaying type-II errors for some sequence £ ra — > 0. We follow the ap- 
proach of pT] of using the concentration properties of appropriate centred linear estimators 
to construct suitable plug-in tests. If the operator A in (jl.ip is compact (i.e. the problem 
is ill-posed), it effectively "smooths" / and so makes it more difficult to distinguish between 
the alternatives Hq and Ha based on the observation Y. To deal with this, we use general 
analogues of the Fourier techniques used in constructing linear estimators in the case of den- 
sity deconvolution (see e.g. [20]). Due to the inverse nature of the problem, it is natural to 
construct such estimators using a diagonalizing basis for A. Moreover, since our approach 
requires good approximation properties within the support of the prior, we consider priors 
that are naturally characterized by (small modifications of) such a basis. 

A key requirement of this testing approach is that the prior distribution assigns sufficient 
mass to a neighbourhood of the true parameter Jq. In this framework, this corresponds to 
establishing lower bounds for the probability that Af is contained in small-ball centred at 
Afo (the "small-ball problem") under the prior. The inverse nature of the problem turns 
out to be of assistance with this condition, since A shrinks / towards the origin. In effect, 
A changes the geometry of the problem by converting an H^-ball into a larger Hi-ellipsoid, 
whose precise size increases with the level of ill-posedness. We shall rely on this notion in 
our proofs and expand upon the details below. 

We apply our general result to prove contraction rates in a number of situations commonly 
arising in Bayesian inference, some adaptive and some not. For instance, in the case of sieve 
priors with random truncation, we show that under weak conditions in the mildly ill-posed 
setting, the procedure is fully rate adaptive over Sobolev classes as in the direct case [2]. In the 
severely ill-posed case, our results suggest that one should calibrate the prior according to the 
operator A at hand. In this case, oversmoothing the prior by a suitable factor is sufficient to 
obtain a minimax rate of contraction. This is not surprising since centred linear estimators in 
the severely ill-posed case are often adaptive (see e.g. |20j for results on density estimation) 
and our tests are built around such estimators. In this setting, unless the prior satisfies 
an analytic smoothness condition, the bias of the linear estimator dominates its variance 
and consequently the minimum of the prior smoothness and the unknown true smoothness 
determines the rate. Since we construct our tests using a bias-variance decomposition of a 
linear estimator, it seems reasonable that our rate will reflect this. 

When considering the specific example of deconvolution, we also consider a wavelet series 
prior on [0,1]. While it is canonical to work in the diagonalizing basis of A, in this case 
the Fourier basis, our results allow some flexibility in considering different yet closely related 
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bases; in particular, this allows us to consider priors constructed using band-limited wavelets. 
This turns out to have useful consequences since we can use the functional characterization 
properties of wavelets to reflect a greater variety of prior assumptions - notably we consider 
Holder smoothness assumptions in addition to Sobolev ones. 

Unless otherwise stated, (•, and denote the inner product and norm of the Hilbert 
space Mi, i = 1,2. For x, y £ K we use the notation x < y to denote that x < Ky for some 
universal constant K. For sequences {a n } and {b n } we write a n ~ b n to mean that there exist 
constants C±,C2 > such that C\a n <b n < C<ia n for all n > 1. We may also sometimes use 
the same letter to denote a constant that varies from line to line. 

1.2 Linear inverse problems 

The Gaussian white noise Z in ([Lip is the iso- normal or iso-Gaussian process for H2. Since Z 
is not realisable as a Gaussian random element of H2 , we interpret the model in process form 
(see e.g. [3]), that is we consider Z = (Z^ : h £ H2) as a mean-zero Gaussian process with 
covariance 'KZhZy = (h,h')2- In this form, (jl.ip is interpreted as observing the Gaussian 
process Y = {Yh '■ h £ Hfe), where 

Y h = (Af,h) 2 + ^. 

It is statistically equivalent to observe the subprocess {Yh k '■ k €N), for any orthonormal basis 
{/ifc}fc 6 N of H2. This corresponds to observing the sequence (Yh k )i where Y^ k are distributed 
as N((Af, hk)2, n~ l ) independently. 

As is natural in inverse problems we consider bases {e^} of Hi that diagonalize A. Denote 
by A* the adjoint of the operator A. If A is a compact operator, then we can use the singular 
value decomposition (SVD) to obtain such a basis. Applying the spectral theorem to the 
compact self-adjoint operator A* A : Hi — > Hi, we know that A* A has a discrete spectrum 
consisting of positive eigenvalues {pfykeN (possibly together with 0) and a corresponding 
orthonormal basis {e^} of Hi of eigenfunctions (see e.g. |23|). We then have a conjugate 
orthonormal basis of the range of A in H2 defined by the equality Ae^ = p^Qk- Letting 
fk '■= (/> e /fc)l) the action of A on / has a simple form when considered in this basis: Af = 
A fk e k) = Ylk Pkfk9k- Writing Y k := Y gk , (fL~T|) is statistically equivalent to observing the 
sequence (Yk) of independent observations, where Y k has distribution N(pkfk, n _1 ). The task 
of estimating / thus reduces to to that of estimating the sequence {fk} from the sequence of 
independent observations (Vfc). 

Whilst priors based on a decomposition of / in the {e^} basis are frequently natural, it is 
often of interest to consider slightly more general types of bases. We therefore consider any 
basis whose elements consist of finite linear combinations of the {e^}. 

Condition 1. Suppose that {4>k} is an orthonormal basis for Hi such that for each k, the 
set {I : \(4>k,ei)\\ / 0} is finite. 

This seemingly small extension actually has large implications for the possible choice of priors. 
For example, if the SVD is the Fourier basis (e.g. deconvolution - see Section \1 .4. 1 1 below for 
more details), then Condition [T] corresponds to a band-limited basis. Band-limited wavelets 
have been used in the deconvolution setting (e.g. |144 [22]), and this allows us to use the 
superior characterization properties of wavelets to create priors that model Holder smoothness 
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conditions rather than Sobolev smoothness conditions, which we do using periodized Meyer 
wavelets in Section [2.31 

In any case, we shall assume the existence of such an orthonormal basis {e^} of eigenvec- 
tors of A* A, though we do not necessarily assume that A is compact. The principle additional 
case we include is the white noise model, when A is the identity operator. If p^ — > 0, the 
problem is ill-posed since the noise to signal ratio of the components tends to infinity as 
k — > oo. Recovering / from Y is then an inverse problem. The severity of this ill-posedness 
can be characterized by the rate of decay of p^ —> 0; the faster this rate, the more difficult 
the estimation problem. As is common in the literature, we shall classify the problem using 
the following standard classes. 

Condition 2. We say that the problem is mildly ill-posed with regularity a if 

Ci(l + k 2 )~ a/2 < \p k \ < C 2 (l + k 2 )~ a/2 as k -> oo 
for some constants Ci, C2, a > 0. 

Condition 3. We say that the problem is severely ill-posed with regularity (3 if 
Ci(l + k 2 )~ a °/ 2 e- cok0 < \p k \ < C 2 (l + k 2 )- ai ' 2 e- cok0 ask ^00 
for some constants C\, C2, &o, &i, (3 > 0. 

The polynomial terms in Condition [3] are included to add flexibility, but do not characterize 
the problem since they are dominated by the exponential terms. 

We shall classify the smoothness of functions via the Sobolev scales with respect to the 
basis {efc}. For s € R, define 

00 

imi^ (Hl) :=E^( 1 + fe2 ) s < 

k=l 

where = (f,e.k)i- We shall generally omit reference to the underlying space Hi when there 
is no confusion possible. The Sobolev space of order sGlis then defined as H s = {/ G Hi : 
< 00}. Note that this concept of smoothness is intrinsically linked to the operator A 
through the choice of the basis {e^}. To be precise, the space H s should be indexed by both 
Hi and A, since it quantifies smoothness with respect to the operator A, but we omit this 
explicit link to simplify notation. For 7 > 0, it is known (e.g. [7]) that the minimax rate 
of estimation over any fixed ball of H 1 is n~ 7 ^ 2a+27+1 ^ under Condition [2] and (logn)~ 7 /' 3 
under Condition [3l Minimax rates are attained by a number of methods, such as generalized 
Tikhonov regularization amongst others [31 [7]. In general, we shall use a and /3 to refer to 
parameters quantifying the ill-posedness of the problem (jl.ip . 7 to refer to the smoothness 
of the true function /o an d 5 to quantify the prior smoothness. 

A key ingredient in proving contraction is establishing lower bounds for the small-ball 
probability of Af about Afo (see f)3.6f) below). As mentioned above, if A is compact then it 
changes the geometry of the problem by converting it into a small-ellipsoid problem in Hi. 
Under Condition [21 



Uf\\l 



y~] PkfkCk 

k=l 



Edft^cb'Eft(i+k?)-* = c 2 \\f\\%- 



k=l k=l 
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so that we are actually considering the small-ball probability of / under the weaker negative 
Sobolev norm H~ a , since the dimensions of the ellipsoid correspond to the singular values of 
A. To establish (|3.6p in the mildly ill-posed case, it is therefore sufficient to prove 

n„(/ €P:C 2 \\f- f \\ H - a < e n ) > e- Cn£ *. (1.3) 

In fact, the greater the ill-posedness of A, the greater the prior mass assigned to an H2- 
neighbourhood of Afo, and consequently the "nicer" the geometry of the problem. As a 
concrete example, if {e^} is the Fourier basis acting on the torus T = [0, 1), then the singular 
values {pk} act as Fourier multipliers and we recover the usual definition of (negative) Sobolev 
smoothness via Fourier series on T. Using the same notion, Condition [3] induces an even 
weaker norm with exponential weighting. 



1.3 The posterior distribution 

In the non-conjugate situation, it is in general not possible to obtain a closed form expression 
for the posterior distribution. For / € Hi, let P/ denote the law of the model (jl.ip so that 
Y is an iso-Gaussian process with drift Af under Pj. Using the sequence space model, P/ is 
statistically equivalent to 

As mentioned in the proof of Proposition 3.1 of |16j . Kakutani's product martingale theo- 
rem shows that for any / 6 Hi, this measure is equivalent to <8>^ =1 A r (0, n _1 ) with affinity 
exp (— H YlkPkfk) > 0- The family of distributions (Ff : f £ Hi) is therefore dominated by 
the law Po (denoting here the law of a pure white noise rather than the "true" law P^ ) with 
density 



dW 1 / ^ oo \ / oo \ 

Wf := = exp W™Yl pk f kZk ~ 2 X^-ffc J = exp ( v™J2pkfkZ k - 1 P/H2 1 , 

where Z& = Z gk . This is "almost" the Cameron-Martin theorem and if Z were realizable as 
a Gaussian element in H2, then the first term in the exponent would reduce to y/n(Af, Z)i. 
Since under Po, Z\. = y/nY^, we can express the posterior distribution via Bayes' formula: 

U(B\Y) = JMl f^il, B e B, (1.4) 

J p e n Z k P k f k Y k -mf\\ldIi(f) 

where V is the support of the prior II. Obtaining an expression of this form for the posterior 
makes it possible to use the approach of Theorem 2.1 of |10| . a fact that we shall use implicitly 
in the proof of Theorem 13. 1[ 

1.4 Examples 

Note that if Hi = L 2 ([0, 1]) and H2 = -ff 1 ([0, 1]) then we can rewrite (jl.lj) in the more classical 
white noise form 

dY(t) = Af(t)dt + n~ 1/2 dW(t), 

where W is a standard Brownian motion on [0, 1] and Af(t) = 4zAf(t). In this setting, the 
direct case corresponds to taking A to be the identity operator, so that Af{t) = f'(t). Our 
results apply to the following situations amongst others (see [7] for a general overview of 
inverse problems). 
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1.4.1 Deconvolution 

A common problem in signal and image processing is periodic deconvolution (see e.g. |14j). 
Consider the 1-dimensional case on the torus T = [0, 1) and, assuming that / is a 1-periodic 
function, define 

Af(t)= [ f*n(a)d8, te [0,1], (1.5) 

J o 

for some known finite signed measure p,, where f*p, stands for convolution on T and where ad- 
dition is defined modulo 1. This fits into the above framework since || / * p\\ L 2 < 11/11^2 IMIjv 
by the Minkowski integral inequality and where denotes the total variation norm for 

measures. For such a p, we can therefore consider A as a map from L 2 ([0, 1]) to ff 1 ([0, 1]). 
We observe Y arising from the model dYt = f * p(t)dt + re -1 ' 2 dWt, where W is a standard 
Brownian motion on [0,1]. The SVD basis is the Fourier basis e&(x) = e 2mkx , k G Z, with 
associated eigenvalues given by the Fourier coefficients of p, namely pk = Pk = Jq &k{x)dp{x). 
The problem can be either mildly (e.g. [H]) or severely ill-posed depending on the choice of 
measure p. Note that the Dirac measure <5o is admissible under this model and corresponds 
to the direct observation case. This situation can be generalized to higher dimensions. 



1.4.2 Heat equation 

Consider the periodic boundary problem for the 1-dimensional heat equation 
d d 2 

—u(x,t) = -^u(x,t), u(x,0) = f(x), u(0,t) = u(l,t), 

where u : [0, 1] x [0,T] — > R and the initial condition / G L 2 ([0, 1]) is 1-periodic. The task is 
to recover the initial condition / from a noisy observation of u at time T. The solution to 
this problem is given by 

oo 
k=l 

where fk = (f,ek)L 2 with ek(x) = y/2sm(kirx). Thus we can express u(-,T) = Af with 
Pk = e~ n k T l 2 , Recovering / from an observation u(-,T) corrupted by a white noise of 
intensity n _1//2 thus leads to a severely ill-posed inverse problem satisfying Condition [3] with 
P = 2. This problem has been studied in the Bayesian context under conjugate Gaussian 
priors in |17j . 



1.4.3 Radon transform 

Another example is given by the Radon transform, which is used in computerized tomography 
(see [15] for more details). Let D = {x G M 2 : \\x\ \ < 1} and suppose that / : D — > R is some 
function in L 2 (D) (with Lebesgue measure) that we wish to estimate based on observations of 
the integrals of / along all lines intersecting D. Parametrize the lines by the length s G [0, 1] 
of their perpendicular from the origin and the angle (p G [0, 2tt) of the perpendicular to the 
x-axis. The Radon transform is defined as 



Af(s,w) = ; 




f(s cos ip — t sin <p, s sin ip + t cos ip)dt, 
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where (s,tp) £ S = [0, 1] x [0, 2n). The Radon transform can be considered as a map 
A : L 2 (D) — > L 2 (S,fi), where dfj,(s,ip) = 27r~ 1 Vl — s 2 ds dip and consequently fits into the 
framework of (11. ID . Considered as such, A is a bijective and bounded operator with SVD that 
can be computed using Zernike polynomials, leading to a mildly ill-posed problem satisfying 
Condition [2] with a = 1/2 (see [15] for more details). 



2 Main results 

We analyse the contraction properties of a number of priors in the inverse problem setting 
under the assumption that Y has law Py for some unknown fo 6 Hi. 



2.1 Sieve priors 

Consider a sieve prior in the orthonormal basis {e^} that diagonalizes the operator A* A. We 
take 

M 



f = J2f k e k , (2.1) 



k=l 



where M has probability mass function h on N with distribution function H. We take the {fk} 
to be independent (real or complex as required) random variables with density tZ Q.{tZ ■), 
for some sequence {r^} to be specified below, and for q some fixed density. The prior can 
thus be expressed as 



oo 

n = ^2 h(m)U r 

m=l 



where n m (xi, x m ) = Y\!k=i T^Q (fj[r ) ■ P r i° rs °f this form have been studied (e.g. pU [29] ) 
and, under suitable conditions on h and Il m , are adaptive over Sobolev smoothness classes in 
the non ill-posed case [21 [13]. Upon suitable calibration of the prior with respect to A, this 
adaptation property extends to the ill-posed case when considered over the classes H" ( (Mi) 
for 7 > 0. We firstly make the following assumption on q. 

Condition 4. The density q : R(or C) —> [0,oo) satisfies 

De -d\xr < q ( x ) 

for all x 6R (or C) and some constants D, d > and w > 1. 

Our first result shows that if the true parameter is actually of the form (|2.ip , then in the 
mildly ill-posed case we recover a y/n-iaie up to a logarithmic factor. 

Proposition 2.1. Suppose that A satisfies Condition^ with regularity a and that the true 
function fo is a finite series in the {ek}-basis. Let < h(m) < Be~ bm for some constants 
B, b > and all m £ N and suppose that the density q satisfies Condition^ for some w > 1. 
Then for a sufficiently large constant C > 0, 



n /GM 1 :||/-/ || 1 >C^ g 



n 



Y 



in Py -probability as n — >■ oo. 
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When the true regression function is not exactly of this form, we naturally expect a slower 
rate of convergence. The next result deals with the case where we consider a general function 
lying in some Sobolev class H' 1 , 7 > 0. We introduce a parameter 70 < 7 that represents 
a known a-priori lower bound on the unknown smoothness and allows use of a more tightly 
concentrated prior. Note that the choice 70 = is valid in the following theorem and so a 
non-trivial lower bound is not necessarily assumed. 

Proposition 2.2. Suppose that the true function fo is in // 7 (IHIi) for some 7 > and 
that A satisfies Condition with regularity a. Consider the prior U described above with 
B\e~ hxm < h(m) < B2e~ b2m for all m £ N, for some constants B\ , B2 , b\ , 62 > 0, and with 
density q satisfying Condition^ for some w > 1. Suppose moreover that the scale parameters 
satisfy B 3 (l + k 2 )~^/ 2 {\og k)~ l / w < r k < B 4 (l + /c 2 )( a+1 )/ 2 for some B 3 , B 4 > and 7o < 7. 
Then for a sufficiently large constant C > 0, 



in Ff -probability as n — > 00, where n = 20+27+ 1 ' 

We firstly note that this prior gives a fully adaptive convergence rate over all the Sobolev 
classes H 1 up to a logarithmic factor. It is worth commenting on the bounds needed on {t^}, 
both of which are used to establish the small-ball condition (|3.6[) . and which depend on the 
operator A and the lower bound 70. Note that the choices = r for all k, corresponding to 
the {fk} being i.i.d., or decaying coefficients x (log/c) -1 /"' both satisfy the conditions of 
Proposition 12.21 and require no assumptions on the unknown smoothness. The requirements 
on {rfc} are therefore no real imposition and merely add flexibility when calibrating the prior. 
The lower bound reflects that the prior cannot (up to a logarithmic factor) pick coefficients 
that decay faster than those of fo. If a non trivial lower bound 70 > is a-priori known, then 
smoothing the prior to incorporate this information would yield a more concentrated prior, 
thereby reducing the size of credible sets whilst not affecting the rate. The upper bound is 
extremely mild and actually allows the size of the components to increase with k. It ensures 
that the moments of {Af)k (assuming that they exist) are 0(1) as k — > 00, so that the prior 
component moments cannot grow faster the than the operator A can regularize them, thus 
allowing the use of larger variances than would be possible in the direct case (a = 0). 

When working in the severely ill-posed case, we must calibrate our prior to the degree of 
ill-posedness (i.e. the parameter (3). When the true parameter is a finite series in the {e^} 
basis, we again recover a ^/n-r&te up to some strictly subpolynomial factor that grows more 
quickly than the logarithmic factor arising in the mildly ill-posed case in Proposition 12.11 

Proposition 2.3. Suppose that A satisfies Condition [3| and that the true function /o is a 
finite series in the {ek}-basis. Suppose that q satisfies Condition^ for some w > 1, let 
h{m) > for all m G N and suppose that 1 — H{m) < e - bml3+1 as m — >■ 00 for some constant 
b. Then for a sufficiently large constant C > 0, 




II /GHi: ||/ -/o||! >C^ Y)^0 




in ¥f Q -probability as n — )• 00, where w : 
than any power of n. 



2a +/3+l p 

(log n) 2 (' 3 + 1 ) exp (c(log n) P+ 1 ) grows more slowly 
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Since the bias strictly dominates the variance in the severely ill-posed case, we must 
calibrate the tails of the distribution H of M according to the ill-posedness of the problem to 
verify the bias condition f)3.5j) : indeed the more difficult the problem (larger (3) the thinner 
tails we require. In the severely ill-posed case, the dominating behaviour of the bias means 
we need a more careful control of the approximation error. We therefore assume that the 
density q is that of a standard Gaussian distribution. Note that 5 in the following proposition 
corresponds to the Sobolev smoothness of a prior element. 

Proposition 2.4. Suppose that the true function /o is in if 7 (Hi) for some 7 > and 
that A satisfies Condition^ Suppose that the prior H satisfies h(m) > Bie~ bml3+1 for all 
m > 1 and that 1 — H(m) < B2 exp(— bm" +1 ) asm->oo for some constants B\,B2,b > 0. 
Suppose moreover that the density q is standard Gaussian and that the scale parameters satisfy 
T k = (1 + fc 2 ) _<5_1 / 4 for some 5 > (3/2. Then for a sufficiently large constant C > 0, 



F -)• 



/ (,5-/3/2)A-y 

n I f G Hi : ||/ - /oil! > C (logn) 3 

in Ff -probability as n —> 00. 

Note that the two conditions on H are mutually satisfiable and that the exponential tails 
used in Propositions 12.11 and 12.21 satisfy this tail condition corresponding to /3 = 0. In the 
severely ill-posed case, oversmoothing the prior by a factor of /3/2 yields the minimax rate of 
convergence. This factor increases with the ill-posedness of the problem and arises from the 
lower bounds used for the small-ball probability of Af. The lack of adaptation in this case 
results from the combination of the constraints (13. 3^ and (|3.5h . which are more stringent in 
the dominating bias case. 



2.2 Gaussian priors 

Consider now the conjugate situation where we take II to be a Gaussian measure on Hi. The 
conjugate situation provides a canonical example in that the posterior distribution can be 
computed explicitly in this situation, and so provides a useful reference point for the accuracy 
of our approach. Recall that a Gaussian distribution has support equal to the closure of its 
reproducing kernel Hilbert space (RKHS) H (see e.g. [26] for more details); since the posterior 
has the same support, consistency is only achievable when Af$ is contained in this set. 

A Gaussian distribution N(v, A) on Hi is characterized by a mean element v G Hi and 
a covariance operator A : Hi — > Hi, which is a positive semi-definite, self-adjoint and trace 
class linear operator. A random element G in Hi has N(i>, A) distribution if and only if the 
stochastic process ((G,h)i : h G Hi) is a Gaussian process with 

E{G,h) 1 = (u,h) lf cov((G,/ i ) 1 ,(G,/i')i) = (h,Ati)i. 

We now take the prior to be a mean-zero Gaussian distribution so that / ~ II = N(0, A). 
We shall make the following assumption as in \16\ ITT] . 

Condition 5. Suppose that the operators A* A and A have the same set of eigenvectors {e^} 
with eigenvalues {/o|} and {t|} respectively, with r| = (1 + A; 2 )"" 5 " 1 / 2 and pk satisfying either 
Condition^ or[3 as specified. 
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The parameter 5 represents the smoothness of the prior in that / £ H s (Mi) for all s < 5 
almost surely. In particular, E||/||^ s = J2T=i(^ + A; 2 ) 5-5-1 / 2 < oo if and only if s < 8. The 
mildly ill-posed case is dealt with in [16] using the conjugacy of the prior and we recover the 
same rates using our testing approach combined with the results of |27] (though we do not 
consider the case of scaling). We firstly obtain (a subset of) the results of Theorem 4.1 of 

US]. 

Proposition 2.5. Suppose that A satisfies Condition^ that /o £ iJ 7 (Hi) for some 7 > 0, 
and assign f the Gaussian prior distribution N(0,A), where A satisfies Condition^ Then 
for a sufficiently large constant C > 0, 

Il(/€Hi: \\f-f \\ 1> Crr**$+i 

in Ff -probability as n — >■ 00. 

We therefore obtain the minimax rate of convergence only when the prior smoothness 
matches the true unknown smoothness. While this prior is not adaptive, it is reassuring that 
if the true smoothness is known then the optimal rate of convergence is attainable. Given 
that this result is obtained using the testing approach introduced in [ID], it should be possible 
to apply the ideas of [28] in using a Gaussian random field with inverse Gamma bandwidth 
to construct an adaptive Gaussian prior. However, we do not pursue such an argument 
here since it is beyond the scope of the present article. Consider now the severely ill-posed 
analogue. 

Proposition 2.6. Suppose that A satisfies Condition^ that /q £ if 7 (Hi) for some 7 > 0, 
and assign f the Gaussian prior distribution N(Q,A), where A satisfies Condition^ Then 
for a sufficiently large constant C > 0, 

Ui /£M i: ||/-/o||i >C(logn) e 

in Ff -probability as n — >■ 00. 

A gap arises in our rates when the prior undersmooths (i.e. 7 + (3/2 < 5), since in the 
case of the heat equation (j3 = 2), [IT] obtain rate (logn)~~. This gap appears to arise 
in Lemma 14.21 from our bound for the covering number of the unit ball of the RKHS of Af, 
which is used to lower bound the small-ball probability of Af using the techniques of [18] . 
This lower bound seems difficult to improve and so this gap may be an artefact of our proof. 

2.3 Uniform wavelet series 

The approach used in this section can be generalized to any band-limited orthonormal basis 
for a general inverse problem in the sense of Condition [TJ However, for ease of exposition, 
we restrict ourselves to the specific case of periodic deconvolution using wavelets. Therefore, 
consider the case of deconvolution under the standard white noise model on [0, 1] described in 
Section [1.4. II so that A is given by ()1.5f) with SVD given by the Fourier basis. Suppose that 
we have an a-priori belief that the true function /q satisfies some Holder smoothness condition 
rather than a Sobolev condition. We shall expand upon the uniform wavelet series introduced 
in [IT] by creating a hierarchical prior that uniformly distributes the wavelet coefficients on 
a Holder ball of random radius. 




Y 
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Let (3>, VP) denote the Meyer scaling and wavelet function (see [2T] for more details). As 
usual, define the dilated and translated wavelet at resolution level j and scale position k/2 J 
by $j k (x) = 2il 2 <$>(2?x - k), ^ jk {x) = 2^ 2 ^(2^>x - k) for j, k G Z. The system of wavelet 
functions provides a multiresolution analysis of L 2 (R). By periodizing the wavelet functions 

4>jk{x) = E $jk(x + m), ipjk(x) = + m), 

we obtain a natural multiresolution analysis for periodic functions in L 2 ([0, 1]). We thus have 
the following expansion for any periodic function / G L 2 ([0, 1]): 

2*0-1 oo 2 ; -l 

/ = E a hk<t>j k + E E Mlk, 

k=0 l=j k=0 

where the wavelet coefficients are given by ctjk = (f,<t>jk)L 2 an d Afc = {f^ik)h 2 - 

Meyer wavelets are band limited: in particular the Fourier transform -Fr^] over M. sat- 
isfies supp(F[^]) C {w : \w\ G [2tt/3, 87r/3]}. This implies that the periodized wavelets are 
themselves band-limited with supp(i 7 r[^]) C Z n {w : \w\ G [2tt/3, 87r/3]} (c.f. Theorem 
8.31 in [9]), where Fj[i/j](m) = Jq ^(x)eT 2 ' K%mx dx denotes the mth Fourier coefficient of ip. 
In particular, each wavelet function has finite Fourier series and so the periodized Meyer 
wavelet basis satisfies Condition [TJ As mentioned in the introduction, band-limited wavelets 
have been employed to great effect in the deconvolution problem by a number of authors (see 
for example [HI [22] for references). 

In [11] . it is assumed that a quantitative upper bound is known on the C^-norm of the 
unknown function. We shall relax this to the case where it is simply known that | |/o| \qs < oo. 
A natural way to circumvent this problem is to treat the unknown radius B of our Holder 
ball as a hyperparameter and assign to it a prior distribution, thus creating a hierarchical 
model. Assign to B a probability distribution H, which for simplicity we restrict to the 
natural numbers N, with probability mass function h. Given B, we then consider the periodic 
function 

oo 2 l -l 

U s (x) = u^x) + 2 E 2-' (5+1/2 W/ fc (x), 

1=0 k=0 

where u,Uk ~ U(—B,B) are i.i.d.. Now by the wavelet characterization of the Besov spaces 
B° q ([0, 1]) (see for instance Definition 1 of [33]), we have that U s G C 5 ([0, 1]) = S^QO, 1]) 
almost surely and in particular HL^H^ < B. Denote the law of Us given B by n 5 ' s so that 
our full prior can be expressed as 

oo 

n*- H = J^h(r)n*' r , 

r=l 

giving a sieve-type prior. We consider only the mildly ill-posed case. 

Proposition 2.7. Suppose that A is of the form (|1.5|) and satisfies Condition and that 
fo is periodic and in C 7 ([0, 1]) for some 7 > 0. Suppose that the distribution H satisfies 
h(r) > e~ DrV for all r G N and 1 — H(r) < e~ Dr " as r — > 00 for some constants D > and 
1/5 < v < 00. Then there exists a finite constant C such that 

U s > H (f€V:\\f-fo\\ vt >CZ n \Y)^0 
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in ~Pf Q -probability as n — > oo, where 



2a+2(«-l/i/)+l if ft < ry _|_ A 

2 Q +2 7 +i (i g n yi if 8 = ^ + 1 



where rj = ^a+^y+i^ • V H satisfies the sharper tail condition 1 — H(r) < exp (— e Dr "} as 
r — > oo for some constants D > and v > 0, iaen t/ie rate improves to 

s , 
£ n = re 2 Q +25+i (logn) r ' 

/or all 5 < 7 , ^ere rf = (2a+1 ) ( J^ (1/ " )) . 

As well as the prior smoothness, the thickness of the tail of H, as measured by u, affects 
the rate. When S < 7 + -, we attain the optimal rate of convergence for a (6 — 1/v)- 
smooth function, that is we lose 1/v degrees of smoothness. This is entirely due to the bias 
constraint (|3.5|) : the bias of a typical element arising from H &,B is proportional to B, and the 
approximation errors therefore grow on average with the thickness of the tail of H. Note that 
this penalty disappears (or is relegated to logarithmic terms) if we take H to have compact 
support (y = 00) or a double exponential tail. We obtain the minimax rate of convergence, 
up to logarithmic terms, only if the prior smoothness matches the underlying smoothness of 
/o up to the correction term -. Finally, note that if we take v = 00 and the prior oversmooths 
the true parameter /o, then we do not have posterior consistency since /o does not lie in the 
support of H 5,H . 



3 General contraction results 

To prove posterior contraction in a number of settings, we prove a general result along the 
lines of Theorems 2 and 3 of [11] adapted to inverse problems. We quantify the effects of the 
operator A through a sequence of factors {5k}- Consider the set of indices 

Ak = : \ {4>m,ei)i I 7^ for some 1 < m < k} (3.1) 

and define 

5 k = inf \pi\, (3.2) 

that is we take the smallest pi such that one of the first k basis elements §\ , . .., 0^ has a 
non-zero component in the e% direction. By Condition [TJ we know that for any k S N, A k is 
finite and consequently 5^ > and the {5^} form a decreasing sequence. Note that if we are 
working directly in the spectral basis {e^}, we simply recover 5 k = p^ 1 ■ 

Theorem 3.1. Consider the white noise model (|1.1|) and let be an orthonormal basis 

of Hi satisfying Condition [IJ Let V C Hi and let Tl n denote a sequence of priors defined on 
a a-algebra ofV. Let e n ,£ n — > be sequences of positive numbers and k n — > 00 be a sequence 
of positive integers such that ^fne n — > 00 as n — >■ 00, 

k n < cne^ and < C\^ n (3.3) 
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for some c,C± > and all n > 1, and where 5^ is defined by (|3.2|) with respect to 
Denote by P m the projection operator onto the linear span of {<j)k ■ 1 < k < m} and let V n be 
a sequence of subsets of 

{feV:\\P kn (f)-f\\i<C 2 U} (3.4) 
for some C 2 > 0. Moreover, assume that there exists C > such that, for sufficiently large 
n > 

IW) < e^ 4 ^, (3.5) 

U n (f E V : \ \Af — Af \\ 2 < e n ) > e~ Cn£ *. (3.6) 

Suppose that Y has law P/ , where fo € Hi is such that ||-Pfc„(/o) ~~ fo\\i = 0(£ n ). Then there 
exists a constant M < oo such that 

n(/eP:||/-/o|| 1 >M£ n |y)^0 

as n —> oo in Pj -probability. 

In an analogy to the frequentist approach, the quantity £ n /^k n in f|3.3j) represents the 
variance term of the centred linear estimator used to test (|1.2|) . while £ n represents its bias. 
In the mildly ill-posed setting of Condition [2j the optimal outcome is to balance these terms 
so that (|3.3|) is an equality (up to constants) . Taking k n ~ ne 2 n gives the optimal result using 
this method, yielding rate £ n ~ n a e^ x+l . 

In the severely ill-posed setting of Condition [3] it is known (e.g. |5] in the case of density 
deconvolution) that the bias strictly dominates the variance as long as the true function is 
"rougher" than the operator A. By this we mean that if /o strictly falls within some Sobolev 
class, or satisfies some weaker analytic condition than Condition [3j then £ n will be of strictly 
larger order than £ n /^k„ so that (|3.3|) will be a strict equality (which must be verified in 
practice) and we take k n = o(ne^) as n — > oo. Since our method relies on the approximation 
properties of the prior, the prior bias is equally important as the true bias in determining the 
contraction rate in this case. 

From a frequentist perspective, it is entirely reasonable to calibrate the prior according 
to the inverse problem, since the operator A is assumed known. From a pure Bayesian 
perspective this may seem unduly restrictive, since the prior is supposed to represent a-priori 
knowledge of the unknown element / and should not depend on the observation scheme. We 
argue that since Theorem 13.11 already makes implicit use of this knowledge via the choice 
of a basis satisfying Condition [TJ it is perfectly reasonable to employ full knowledge of the 
operator. In particular, for the case of sieve priors in the severely ill-posed setting presented 
in Propositions 12.3 1 and 12. 4| we allow the distribution H to depend on the regularity (3 of A. 

3.1 Proof of Theorem 1370 

A key step in the proof of Theorem 13. II is the construction of nonparametric tests for suitably 
separated alternatives in Hi . The tests are constructed based on the norm of a simple plug-in 
estimator of /o, which is then split using a standard bias- variance decomposition. We require 
an exponential bound on the type II error of our test and can attain this using Borell's 
inequality j3]. We can construct a suitable linear estimator for /o using band-limited (in the 
sense of the {efc}-basis) elements in a similar fashion to the deconvolution density estimators 
based on Fourier techniques studied in [13] and [22]. 
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Suppose that {<j>k} is an orthonormal basis of Hi satisfying Condition [TJ Writing (fi^. 
(4>k, e i)i an d using that {gu} is the conjugate basis to {e^} for A, 

(f,4>k)i = (f,^2<t>k,iei)i = {f,^ j 4> k ,iP^ 1 A*g i )i = (Af,^24> k4 pr 1 g i ) 2 =: (Af,4> k ) 2 , 

i i i 

where 



^2 Pi 1 4>k,i9i- 



Recall that by Condition [H only finitely many of the fa i are non-zero. In particular, note 
that if cf>k = ek, then we simply have 4> k = p k 1 gt- In this way, we derive a (not necessarily 
orthonormal) basis of the range of A that is conjugate to {</>&}. We can therefore express the 
coordinates of / in the {<fik} basis of Mi in terms of the action of {4>k} ° n Af- Considering 
this action, define 

where Z k = Zt are (not necessarily independent) mean-zero Gaussian random variables 

with covariance KZ^Zi = ((/>k,4>i)2- Thus the sequence {y^} provides an unbiased estimator 
of the coefficients of the true regression function / in the basis {4>k}- The sequence {Z k ) 
is independent if and only if {4>k} forms an orthogonal sequence, which is the case when 
0fc = e k- This suggests a natural linear estimator of /: 

k=l 

where the resolution level k n is to be specified. Recall that we write P k for the orthogo- 
nal projection operator onto the linear span of {4>i : 1 < I < k}. The estimator f n then 
decomposes immediately into its bias and variance parts 

We now construct an exponential inequality for the fluctuations of the random part of 
f n , that is the centred term f n — E/ n , following the method presented in Section 3.1 of [TTj . 
By the Hahn-Banach theorem and the separability Hi, there exists a countable and dense 
subset Bq of the unit ball of H' x = Hi such that 

||/||l= SUp \{h,f)!\. 

heB 

The norm of the variance part of our estimator can thus be written 



\\fn - ^fn\\i = SUp —= 
heB V n 



k/l 
k=l 



: sup \G(h)\, 
heB 



where G = (G(h) : h £ Bq) is a centred Gaussian process indexed by a countable set. 
Applying the version of Borell's inequality for the supremum of Gaussian processes ([19j. 
page 134) gives 

P(||/„-E/ n || 1 -E||/„-E/ n || 1 >x) = pf sup \G(h)\ -Esup \G(h)\ > x] < e^*"* , 

\heB h£B a J 

(3.7) 
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where a 2 = sup ft6B() KG(h) 2 is the weak variance of G. By Jensen's inequality, the expectation 
can be controlled as 

1 ( kn \ V2 1 ( kn \ V2 1 ( kn V /2 
E||/.- V .|| I --^E^Zj) ^(EEZj) -^(ElWi) ■ 

Recall the definitions (|3.ip and (|3,2p of the sets ^4^ and quantities <5&. Since the {<5fc} form a 
decreasing sequence 

11^111= £prV w <^£4,<^ 

i&Ak k iGAi. k n 

so that 



Considering the weak variance cr 2 , we have that for h £ Bo, 

kn kn 



nEG(h) 2 = ^2^2{h,<l> k ) 1 {h,(l> l ) 1 EZ k Z i 



k=l 1=1 

kn kn 



^2 ^)i(^fc) 



fc=i j=i 



k=l 



While the basis {0fc} is in general not orthogonal, it is sufficient that each finite sequence 
forms a Riesz sequence (whose constants vary with the number of terms). Since the A^s 
form an increasing sequence of sets and using the definition of (frk, 



k=l 



Wk 



kn 



k=l 



kn 



< 



^A kn \k=l 

k E (l>. 

fe " iGAj.„ \fc=l 



1 °° 



k)i\<Pk,ei)i 



I k n \ 

^fc=l / 1 



Combining these yields 



a" < 



sup Hftiii 



< 



Substituting these bounds into Borell's inequality gives 



n5? 



k=l 



< 



k n 



< exp ( --nb\ n x 2 ) , 



2 

1 ■ 
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which, upon letting x = v/ |^ £n for some constant L, gives 



P (\\f n - E/nH, > -L f v^Ie n + j < e- L < 

Since fc n < cne^ for some constant c > 0, we have for sufficiently large n, 

< e- Ln£ " (3.8) 



\\f n -^U\\i>M^\ 



for some constant M = M(L, A) large enough. 

Proof of Theorem \3.1\ Following the proof of Theorem 2.1 in [10] almost exactly line by line, 
but using formula (|1.4p for the posterior distribution in the inverse setting, we recover an 
analagous theorem for the sampling model (jl.ip . In particular, it is sufficient to construct 
tests (indicator functions) <fi n = 4> n (Y;fo) such that 

E /o <^0, sup E / (l-</ )n )< e -( c+4 ) n£ ", (3.9) 

/e7V||/-/o|li>Mf„ 

where the constant C > matches that in (|3.6|) (the analogue of (2.4) in [10] for (jl.ip ). 
Recall that we are testing the hypotheses (\l.2\) . 

We can now consider the plug-in test 4> n (Y) = 1 {||/n — /olli > Mo£ n }, where the constant 
Mo is to be selected below. Recall that we have assumed that the contraction rate £ n satisfies 
T^- < c£,n for some c > and all n > 1. The type-I error satisfies 

E /o n = F/ (||/n - /olli > MoCn) 

< P /o (||/ n - EfJ^ > M e„. - ||E /o / n - /olli). 

By hypothesis, the bias of /o satisfies ||-Pfe n (/o) ~~ /olli 5; -^£n for some -D > 0. Letting L\ > 
be some constant, we can take Mq sufficiently large so that applying (I3.8P gives 



% o 0n < IP/o (ll/n " E/o/nllx > (M - £>)£„) < e" Ll " e " -> 

as n — > oo. 

Now consider / £ "P n such that ||/ — /olli > M£ n . Letting L2 > be some constant, we 
can pick M sufficiently large so that applying the triangle inequality and (|3.8I) . 



E f (l-0 n )=F f (\\f n -f o \\ 1 <MoU) 

< P/(ll/0 " /111 " 11/ " E/nlli " P/n - /nlli < M £ n ) 

< P ; ((M - C - M )e„ < ||E/„ - fnW,) < e~ L ^, 

since by assumption supj e -p n || / — EJ^^ < C^n- This verifies (|3.9p . □ 

4 Proofs 

4.1 Proofs of Section 12.11 (Sieve priors) 

Proof of Proposition \2.1\ By hypothesis, the true regression function takes the form /o = 
Y^k=i fo,k e k f° r some too S N. We first verify the small-ball condition (|3.6p . Let / be a finite 
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series generated from II, conditionally on M = uiq. As noted in Section Tl.21 since A satisfies 
Condition [21 it is sufficient to prove (|1 ,3|) to establish ()3.6|) . Therefore, 



mo 



p (ii/ - / || H _„ < e n ) = p yr; - / 0ifc | 2 (i + k 2 y a < e 2 n \ 

>w(\f k - f ,k\ 2 (l + k 2 r a <—, for all A; = 1,..., m ) (4.1) 

=a,fi A _ A ,i<s s a^) 

by the independence of the /fc's. Now if X is complex-valued with density q : C — > [0, oo) 
satisfying Condition 01 then for all 2 £ C and f > 0, 



rt e -d(\z\+rr dr>27r Dte- d ^+V W . 



F(\X -z\<t)> [ f* De~ d \ z+reie \ w dr d9 > 2ttD [ 
Jo Jo Jo 

(4.2) 

If X is real-valued, then the same estimate holds without the 7r term; we shall therefore stick 
to the real-valued case, but note that everything below holds also in the complex case with 
slightly different constants. Let a n ^ = — — and note that for fixed k, a n ^ — > as 
n — > 00 since e n — > 0. Thus there exists E > such that a n & < E for all 1 < k < mj and 
n > 1. Using (|4.2p . we lower bound the right-hand side of (j4. 1[) by 

mo / mo / \ mo 



n 2D^e-^r(\f^ n , k r > Ciexp K- log _ ^^g-- 1 (i/ 0)fe r + <*) 

fe=i Tfc \fe=i V r fc / fe=1 

/ mo / (1 + jfc 2 W 2 \ 

> Ci exp m log e n + log - C : 



>C 3 (f ,{T k },a,q)e C ^", 

where we have used that (a + b) w < 2 w ~ 1 (d w + b w ) for a,b > and > 1. Now since mo is 
fixed and h(mo) > by assumption, 

II(/ £ V ■ \\Af - Af \\ 2 < e n ) > h(m )C 3 e c * l °^ > e CsIog£ " 

for some constant C5 > 0. The choice E n = ( J then satisfies (|3.6|) . 

Consider now the bias constraint (|3.5p . Take A; n to be an integer satisfying Line 2 < 
< L2n£ 2 l for some constants L±,L 2 , and let = {/ € Hi : / = ^fc=i/fc e fc}- By the 
assumptions on h, we have n("P£) < Ce" 6 ^ < e" Lnf S where L is a constant that can be made 
arbitrarily large by choosing L\ sufficiently large. Now for all / £ V n , we have the trivial 
bias result || / — ffc„(/)||i = 0, so that choosing L large enough to match the constant used 
to establish (|3.6p above, we verify ()3.5|) . Finally, for the true function /o the bias condition 
follows immediately since ||/o — ffe„/o|li = for k n > mo- Applying Theorem 13. II with 

tn = ^< C£ n K = CV^ 1 = C (1 ° g 



completes the proof. □ 
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Proof of Proposition \2.SA By the triangle inequality 

11/ - h\\ H - a < 11/ - P jn (fo)\\ H - a + \\PjMo) - M\ H - a , 

where j n is to be selected below. Since /o G H 1 , 

oo 

\\PjM ~ fo\\l- a = £ l/o, fc | 2 (l + fc 2 )- Q <^- 2 ^)||/o||^. 

fc=jn + l 

1 

Taking j n ~ e n Q+7 gives 

P(||/ - /ollfl-a < e„) > P (||P in (/o) - /|| ff -a < ce„) 
for some c' > 0. Let a n ,fc = £n ^ 1+ JlJ 1 ; and suppose that / is a finite series in the {e^} basis 



of degree j n . Then using (|4.2p as in the proof of Proposition 12. 11 

Jn Jn 

P (11/ " P/J/o)lltf-« < £n) > II P (I/* - /o,*l < «»,*) > II D'a n , k r^e- d ^ W (\fo, k \+*n, k r 

k=l k=l 

> exp lj n log d + £ log - C 2 £ r fc -» (|/ 0)fc r + < fe ) j 

(4.3) 

Using the upper bound r fc < /3 4 (l+/c 2 )( a+1 )/ 2 , gives Efeli lo S (^H 1 + k 2 ) a / 2 ) > -E^logjn 
for some E x > 0. Since / G ff 7 , we have |/ , fc | < (1 + A; 2 )" 7 / 2 ||/ || H7 < C(/ )fc - T for all 

A; > 1. Moreover, for < i n , note that a n ,fc ~ jn a_7_1 ^ 2 (l + /c 2 ) a / 2 < E 2 jn"'~ 1 ^ 2 . Substitut- 
ing these bounds into (|4. 3[) and using that r fc > JB 3 (1 + A; 2 )~ 7 °/ 2 (logfe)- 1 / u ' yields the lower 
bound 



exp I C 3 Jn bg £n - C 4 Jn log J n + 2^ lo S I _ ) ~ ° 5 1^ 

V k=l V TA: / k=l 



■yw , — (7+1/2)ion 



> exp ^-C 6 J n logJ n - Elj n logJ„ - C 7 £ log fcj > exp (-C 8 j n logj„) , 

where we have also used that loge n ~ — logj n . In conclusion we have shown that 



P(ll/ " M\h-° < e n ) > h(j n )e- c ^ l °^ > Bl e- b ^- c ^ l °^ > e 
Condition (|3.6|) is then satisfied by the choice e n 



„ -1/(0 + 7), 

-Cge n log ■ 



c* + 7 

logn \ 2a + 2 7 + l 

n 



Again take V n = {/ = X^fc=i /fc e fc}i where A; n is an integer satisfying L\ne n < k n < Z^ne 2 . 
Proceeding as above, we get || / — Pk n {f)\\\ = for an / £ P« and II('P^) < e _Lne ™ for a 
suitable constant L, thereby verifying (13.5p . This yields contraction rate 



£ o (2ct + l)(a + 7 ) 7 

E, n = — - < Ce n (ne n ) a = C(logn) 2 Q +2 7 +i n 2^+27+1 



Finally, for the true regression element /o, 

ll/o-i^C/oJII^^H/olU^ 
as required. Applying Theorem 13.11 completes the proof. □ 



fo " P kn {h)\\l < CK 1 \\h\\ H n - (™4)~ 7 = (logn)-^^n-^™ < £ n 
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Proof of Proposition \2.'J[ By exactly the same reasoning as in the proof of Proposition 12.11 
(|3.6p is satisfied with e n = \J (log n) jn. Take k n to be an integer satisfying (L\ logn) 1// ^ +1 ^ < 

K < (Lalogn) 1 /^!) for some constants L\ and Again taking V n = |/ = X/fe=i /fc e fc| 

yields < e" bfc " < e - Ln£ ™ for some constant L that can be made arbitrarily large by 

increasing L\. This verifies (|3.5|) and the bias condition on /o follows exactly as above. Since 
the bias in both cases is equal to for sufficiently large n, we can apply Theorem 13.11 with 
contraction rate 

Cn = |n < CeJl + klr'^e^ < C"(logn)5 + /^n- 1 /2 eC o(L 2 iog„.)/'/(^i) = 

□ 

Proof of Proposition \2.4\ The proof is similar to that of Proposition 12.21 though we must 
notably keep more careful track of the constants involved due to the exponentiation resulting 
from the severe ill-posedness. If A satisfies Condition [31 consider the norm induced analo- 
gously to the Sobolev norm H~ a in the mildly ill-posed case: 



:=£lA| 2 (l + fc 2 r Ql e- 2cofc/3 . 



2 
A 

k=l 



Taking j n ^ ai+/y ^ e~ c °^ n+1 ^ ~ e n and using the same truncation argument as in the proof of 
Proposition 12.21 gives \\Pj n (fo) — /o|U < ce n for some constant c > 0. Thus for / a finite 
series of degree j n in the {e^} basis (and using that q standard normal satisfies Condition H] 
for w = 2), 

/ 3n / ~ \ Jn \ 

P (H^nC/o) " f\\ A < os) > exp j n logd +^log - C 2 ^r fc 7 2 (|/ , fc | 2 + a\ k ) , 

V k=i ^ k ' k=i J 

(4.4) 

where a n , k = j~ 1/2 e n (l + k 2 ) a ^ 2 e c ° k ^ < Cj~ 1 ~ 1/2 e c °^-^ +1 ^ < Cj' 1 ' 1 ' 2 for k < j n and 
by the definition of j n . Now since Tfc = (l + &: 2 )~2~4 and /o £ H 1 ', we have that 



X] 10 ^^ 1 "™^) ^ Jnlog^n - -J„ logjn > Elj^ +1 , 
k=l 

IX 2 im 2 = E fc2m_27fc27 i/o, fc i 2 < / 25 - 2 ^ vo \\M\ 2 m , 



k=l k=l 

Jn Jn 



E^<k < Cj-^^Y, k2{8+1/2) $ ^n +2(5 " 7) 
k=l k=l 

for some constants E\,E-i > 0. Substituting these into (|4.4p gives the lower bound exp (— Csjn +9 ) ■, 
where 9 = max (/3, 2(5 — 7)). In conclusion, the small ball probability satisfies 



En, 



P(|L4/ - 4/b|| 2 < e„) > fcUJe-^' > B ie - C ^ +6 > e" C5 (^- 

1+9 . 

so that ([3.6p is satisfied by the choice e n = (logra) 2 " n ' . 
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Take k n to be an integer satisfying (ailogn) 1 ^ < k n < (a2 log n) 1 ^ for some constants 

6-9/2 

a\ and d2- For this choice of k n , f|3.3j) is verified for the choice £ n = (logn) 1 3 : 
p- < De n (l + kl) a °/ 2 e Cok " < D'(log n) 2 "°w +1 e ^~i/2) logn = (^) 

as long as we take co«2 < 1/2- Recall that for / £ supp(n m ) we have Karhunen-Loeve 
expansion / = TfcCfcefc, where {Cfc} are i.i.d. standard normal random variables. Thus 

for any such /, we can bound the bias by ||-Pfe n (/) — f\\\ < Y^k=k n +i r fcCf ■ We verify (|3.5|1 
by applying Borell's inequality in a similar fashion to that used in the proof of Theorem 13.11 
Using the same notation, write \ \Pk n (f) — /Hi = sn Ph£B G n {h), where Bq is a weak*-dense 
subset ofj/iSEIirll/il^^l} and G n is the Gaussian processes 

00 

G n (h) = (h,P kn (f)-f)i= E ^Cfc(^e fc )i. 

k = - kji~\-\ 

We can control the bias and weak variance terms as follows. Using that YHc=k +1 < 
k\~ w /{w — 1) for w > 1 and applying Jensen's inequality to the bias gives E ||-Pfe„(/) — /Hi < 

\/S^=fc n +i r | — ^n" 5 - For the variance, note that for any h ^ Bq, 

00 

£ ^K^>l| 2 < <+l INI? < < * ^n 25 " 1 - 

Using these bounds, apply Borell's inequality for the supremum of a Gaussian process as in 
(HTTP with x = \jlLne 2 n k^ 28 ' 1 to obtain 

P (ll^U/) " /Hi > (V + yfi%K'- 1/2 )) < e~ Lnel , (4.5) 

where L' is some constant that increases with L. Substituting in our choices of e n and k n 
yields that for n > N, 

p(\\Pk n (f)-f\\i>M(N,L)(logn)-^<e- Ln£ ", 

where the constant M increases with L. Letting V n = {/ G Hi : \\Pk n {f) — f\\i < M£ n } 

2 s-e/2 
for a sufficiently large constant M, we have 11(7^) < e ™ for £ n = (logn) P . This is 

satisfied by our above choice of e n and so, choosing L sufficiently large to match the constant 

obtained in the small-ball probability above, this verifies f)3. 5f) . Lastly, as /o S H" 1 ', then 

\\Pk n (fo) — /0II1 < Ckn 1 = 0(£ n ) exactly as above. Apply Theorem 13.11 to finish. □ 

4.2 Proofs of Section 12.21 (Gaussian priors) 

The small-ball asymptotics of a Gaussian measure in a Hilbert space have been exactly 
characterized by Sytaya [25] and using the techniques of large deviations in [8j. However, 
while exact, the asymptotic expression is rather complicated and relies on the solution of 
an implicit equation that does not yield an explicit rate in terms of e. We therefore obtain 
suitable lower bounds using either direct lower bound methods [12] or the link with the metric 
entropy of the unit ball of the RKHS [IS] (both of which yield the same result). 



EG n (hf 



E 



00 

£ 

1 k A^tt, — |— 1 



T k (k(h,e k ) 
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As mentioned above, a Gaussian distribution has support equal to the closure of its 
RKHS H and so posterior consistency is only achievable when Afo is contained in this set. 
Since / is a Gaussian random variable in a Hilbert space with Karhunen-Loeve expansion 
/ = d ^2 k TkCk e k> where the {Cfc} are i.i.d. standard normal random variables, we can easily 
characterize its RKHS in terms of ellipsoids (see [26] for more details). Letting Mf denote 
the RKHS of /, we have that if a = a^e^, then 



OO 9 

a 

k=l k 



The RKHS norm therefore consists of a weighted ^-norm, weighting the eigenvectors of A 
with the inverse of its eigenvalues. Recall that the concentration function of a Gaussian 
random variable W in a Banach space (B, ||-||) with RKHS EI is defined as 

Me):= inf \\h\g-log¥(\\W\\ < e). 

h£M:\\h—wo\\<e 

By Theorem 2.1 of [27], choosing e n to satisfy (j) WQ (e n ) < ne\ is sufficient to obtain the lower 
bound P(||W — wq| | < 2e n ) > e~ ne ™, and consequently establish (|3.6[) . 

We firstly establish upper bounds for the concentration function <j)Af of the Gaussian 
random variable Af. When the prior oversmooths the true parameter, the approximation 
error in 0^j o (e) dominates as e — > 0, whereas when it undersmooths the centred small ball 
probability dominates. This is quantified by the following lemma. 

Lemma 4.1. Suppose that f ~ iV(0, A), where A satisfies Condition^ and let fo £ i? 7 (EIi) 
for some 7 > 0. Then Af is Gaussian random variable in the Hilbert space Mq. If A satisfies 
Condition® then Af has RKHS equal to H a+5+1 / 2 (M 2 ) (where H S (U 2 ) is the Sobolev scale 
with respect to {gt} ) and the concentration function of Af satisfies 



26-27+1 

£ °=+t if *y < 5 
1 

e a + s if 7 > 6 



<pA f0 (e)<C 

as e — > for some C = C(a, 5, fo). If A satisfies Condition^ then Af has RKHS equal to 
(00 00 ^ 

b = J2h9k € M 2 : \\b\\l = £fe 2 fc (l + k*r +5+1/2 e 2c ° k * < 00 



Af 

k=l k=l 



and the concentration function of Af satisfies 

26-27+1 



, , , , n 1 (logi) " i/7 + # <s 
(log-) «/7 + i><5 



as e — > for some C = C(ao, (3, 5, fo). 



Proof. It is obvious that Af is a Gaussian element in H2 with Af ~ -ZV(0, AAA*). By 
Condition [5] AAA* has eigenvectors {gk} with corresponding eigenvalues {t^p 2 ,}. Consider 
firstly the case where A satisfies Condition [2j Using the above remark about Gaussian 
measures in Hilbert spaces, we have that for any b = Y^k=i bk9k £ U2, 



bl 



b g u Af & \\b\\ 2 MAf =J2^b - E 6 ^ 1 + k 2 r +s+1 ' 2 = ii6n^ +i+1/2(H2) < 00, 

k=l T kPk k= i 
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so that M Af = H a+5+1 / 2 (U 2 ). 

Letting / = Y^k=i /o,fce fe , define hj = Y^{=i Pkfo,k9k to be the projection of Af onto its 
first j coordinates in the conjugate basis {gu}- Then 

oo oo 

\\h 3 -Ah\\l= Y, Pl\fok\ 2 <C Y (l + k 2 )- a \fak\ 2 <C(fo,A)j- 2a - 2 \ 

k=j+l j=k+l 

since /o G H" 1 . Taking j ~ £ _1 /( a +7) gives \ \hj — Afo\\ 2 < e and 

WH\l Af < E^ 2 /o, fc < C(/ 0) A)j(»-W)vo * e - H i^AO i 
fe=i 

thereby giving a bound on the first term of 0a/ o - For the second term we use the explicit 
lower bound (4.5.2) from Example 4.5 in (12j : 

P (| |A/|| 2 < e) = P ^(1 + k 2 )-*- 5 - 1 ' 2 ^ < e^j > Be* 3 "") exp {-w(l + P )" £ -^) , 

where £fc are i.i.d. standard normals, B > is a constant, w = a + 5 + 1/2 and p = 
(2w — = (2a + 2<5) _1 . Using these values gives 

0o(e) = -logP(||A/|| 2 < e) < -logB-p(3- w)loge + u;(l + p) p e~ 2p < Ce~ 1/{a+5) 

as e — > for some constant C = C(a,6,B). Comparing these two rates, we see that the 
approximation term dominates when 7 < 5 while the centred small-ball term term dominates 
when 7 > 5, thus giving the desired form for (f>Af {£)- 

In the case of Condition El substituting in the lower bounds for the eigenvalues {pk} 
gives the specified HLaj. If we repeat the approximation argument above, taking hj with 
j ~ (log j) 1 ^, then \ \hj — Afo\\ 2 < e and 

(2<5-27+l)V0 

The centred small-ball probability can be dealt with using results on Gaussian processes that 
link this quantity to the metric entropy of the unit ball of the RKHS [18] . Applying Theorem 
2 of [TS] and using Lemma 22] below, we get 4>o(e) < (log I) 1+1 ^. it is also possible to derive 
this result using a careful rearrangement of the lower bounds proved in [12]. Balancing these 



terms we have that this quantity dominates when 5 < 7 + ^ and the approximation term 
dominates otherwise, hence the result. □ 

Lemma 4.2. Consider the RKHS M^j of Af under Condition^ as described in Lemma \4-1[ 
and let K^f denote the unit ball of HLa / ■ Then the covering number N(K^f, ||-||# , e) of K^f 
with the usual Hilbert space distance satisfies 



logN(K Af ,\\.\\ H2 ,e)< (log-) 
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Using this result and Theorem 2 of [T5], we obtain the bound — logP(||A/|| 2 < e) < 

C (log ■ This matches the bounds obtained in [6] when considering the general setting 

of heat kernels (/3 = 2) on manifolds. 

Proof. Writing b = J2T=i^k9k, we know that for any b £ K^f we have |6jt| < C(l + 
k 2 )~ a o^ & ^ l / 2 e ~ c ° kfi < Ce~ c ° kl3 , so that Kaj is contained in the infinite rectangle 

oo 

I] \-Ce~ Cokf> ,Ce- cok0 ' . 

k=l 

Taking J = D(log for a suitable constant D, we that for k > J, the width of the above 
intervals is smaller that e/2. Thus any point in the infinite rectangle is within e/2 of the finite 

dimensional cube Jlfe=i —Ce~ c ° kl3 ,Ce~ c ° kl3 and so it suffices to construct an e/2 cover for 
this latter set. By considering a J-dimensional cube, we see that it is enough to cover this set 
by a considering a regular lattice with distance e/(2yj) between adjacent vertices. Therefore 



N 



e\ ^ 2Ce~ cokP _ ( C'^ 



n[-^- cofc ^^- c °l'ii-iu^)<n 



Now by a simple integral comparison test, X^fc=i ^ — J^ +l /(/? + 1)) so that the logarithm of 
the right-hand side is bounded above by 

/ 1\ 7/3+1 / i\l+V0 

C" J ^log J + log - e j - co j— < C" [log y 

□ 

Proof of Proposition \2.5l Let us verify the small ball Condition (|3.6p . Let H^ t denote the 
RKHS of Af and <j>Af denote the concentration function of Af at A fa. Since Af is a 
Gaussian random element in H2, we have by Theorem 2.1 of [27] that if Afo is contained in 
the El2-closure of H^/ and e n satisfies 4>Af (£n) < then P (||^4/ — -A/0II2 < 2e «) — e _ne ™. 

By Lemma [4. 1| the choice e n = n~ 2a + 2S + 1 satisfies this condition in both the cases 7 > 5 and 
7 < S, thereby verifying (|3.6|) . 

Recall that we have Karhunen-Loeve expansion / = X^/bLi T kCkCk, where {Cfc} are i.i.d. 
standard normal random variables. Proceeding as in the proof of Proposition 12.41 and taking 
k n ~ ne^ in (|4.5p . we obtain that for n> N, 

-Lne 2 , 



\P kn (f)-f\\i>M(L,N)(nei)- d )<e 

where the constant M increases with L. Letting V n = {/ £ Hi : ||-Pfc n (/) — /IK < M£ n } 
for a sufficiently large constant M, we have H(V^) < e~ Ln£ ™ as long as (ne^) _<5 < C£ n . 
This is satisfied by our above choice of e n and so, choosing L sufficiently large to match 
the constant obtained in the small-ball probability above, this verifies (|3.5p . Finally, since 
fo G H 1 ', we again recover that ||-Pfc n (/o) — fo\\\ < Ckn 1 \\fo\\ Hl — (ne^)~ 7 , which is smaller 
than £ n = e n (ne 2 l ) a for our choice of e n . Applying Theorem 13.11 completes the proof. □ 
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Proof of Proposition \2.b\ Consider firstly the case where 7+f < 5. As above, f)3.6f) is verified 

5 — r+i/2 

if (j)Af {^n) < n£ n- By Lemma [4. 11 the choice e n = (logn) P n -1 ' 2 satisfies this condition. 
Now let k n be an integer satisfying (L\ logn) 1 /' 3 <k n < (L2 logn) 1 /' 3 for some constants L\, 
L2, and which therefore satisfies k n < cne 2 n for some constant c and the above choice of e n . 
The quantity in the left-hand side of (|3.3p then satisfies 

^ < CeJl + kl) a °/ 2 e Cok " < C'^ognfn^- 1 ' 2 = o ( (logn)-^ 

as n — > 00 provided that L2C0 < 1/2. To verify (|3.5p . substitute our choices of e n and k n into 
(H3D to get 

e -Lnel > p (j| Pfcn(/ ) _ > C (logn)-| + C'(logn) 

Since 5 > 7 + ^ the second term is asymptotically larger, so that taking L sufficiently large, 
we obtain the required exponential inequality (|3,5p with rate (logn) -7 /^. Since /o € -ff 7 , we 
have that exactly as above ||-Pfe n (/o) — /0II1 < Ckn 1 < C"(log n)~"'^, so that we can apply 
Theorem 13.11 

Consider now the case where 7 + f > 5. Arguing as above, e n = (log n)^3~n -1 / 2 satisfies 
the small-ball condition (|3.6p and for the bias we recover the exponential inequality 

e~ Lnel > P - /|| a > Caogn)"^ 21 

By our choice of 5, the above rate is larger than the bias of /o and so yields the rate. □ 
4.3 Proofs of Section 12.31 (Uniform wavelet series) 

Since we are working the deconvolution setting described in Section 11.4.11 we firstly note 
that the Sobolev scale with respect to the Fourier basis corresponds to the classical notion 
of Sobolev smoothness on T, so that H s (Mi) = H s ([0, 1]). As mentioned above, periodized 
Meyer wavelets are band limited and so satisfy Condition Q] which is needed for Theorem 13. 11 
Moreover, since supp(Fj[t/j]) C [—a, a] for some > 0, we have by the standard properties 
of the Fourier transform that the dilated and translated wavelets satisfy supp(Fj[ipjk\) C 
[— 2 J a, 2- J a]. Recalling definition (|3,2p . we therefore have that under Condition [21 

821 = inf \Fy\p\{m)\ < C(l + 2^)~ a ' 2 . 

m£Z:\m\<23a 

Since the ill-posedness affects the rate £ n through [331 we see that using the periodized Meyer 
wavelet basis rather than the SVD (Fourier basis) only affects the constants and does not 
negatively affect the rate. In this section note that ||-|| 2 refers to the L 2 ([0, l])-norm rather 
than the H2-norm. 



Proof of Proposition 2.7. We firstly verify the small-ball condition (|3.6p . Consider the case 



where S < 7. Using the wavelet characterization of the periodic Besov space £?J 2 ([0, 1]) 
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H S ([0,1\) for s £ R gives 



\a(h)\+(j2(2 ls \\m)\\ 



1/2- 



lf 2 



J=0 



oo 2'-l 



(4.6) 



< 2\a(h)\ 2 + 2^ 2ls \\fo(h)\\% < 4max \ |a(/ l )| 2 ,^2 2 ^ £ AfcW 



z=o 



1=0 k=0 



Let a, denote the wavelet coefficients of /o and note that if ||/o||c7 < -B then |a| < 5 
and < 52^ +1 /2) for all Z, jfe. By IfCEj) 



P(II/o-^|Ih-« <£n) >P max 



2' — 1 



a — « 



! ,^2-^^|/3 /fc -2"W/ 2 ) U/fc | 2 < Cl£ 



i=0 fc=0 

oo 2 l -l 



= P (|a - n| 2 < Cl el) P £ 2~ 2 ^ £ \{3 lk - 2^ s ^u lk \ 2 < c l£ 2 

y=o fc=o , 

(4.7) 

using the independence of u and the ui k s. The first probability satisfies 

fc{e r , 



f(\a-u\ < ^e n ) > 



2B 



■™ \ = e c 2 +log( £ „/B) > e c 3 log(£„/B) 



for some constant C3 = C3(<&,^). Let bi k = 2 l ^ l+l / 2 ^ f3i k and pick J = J(n) as defined below. 
The second probability in (14. 7p becomes 



00 2<-l 
-i(2a+2 7 +l) 



/ 00 

> P ^ 2 -2Ka +7 ) sup |^-2-^)n ifc | 2 < Cl e 2 



d=0 

' J 



0<k<2 1 



>P g 2^ +7 ) sup |6 lfc - 2- / ( 5 -^n ifc | 2 + C5 2 ]T 2 - 2 ^+ 5 ) < Cl e 2 . 



j=o 



0<fe<2' 



Z=J+1 



Pick the truncation level J = J(n) so that B 2 Y^Lj+i 2" 2/ ( Q+5 ) ~ j B 2 2~ 2J ( a+<5 ) ~ e 2 > that is 
2 J ~ (e n /fl)-V(«+«). Note that since < £> and 5 < 7, we can lower bound the individual 
probabilities via 



(\b lk - 2~ l ^u lk \ < ce n ) > ( 



CSr, 



2«(7-«5)+l £ 



> 0. 
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Then, choosing the constants defining J(n) appropriately, we have 

P (V 2~ 21 ^ sup \b lk - 2-^W| 2 < c x el - c(a, 5)B 2 2~ 2J ^ + A 
\i=o o< fc < 2i / 



> P max sup \bi k - 2- l( - s -^ui k \ < c±e n 

\0<l<J 0<k<2 , 



J 2 l ~l J 2 l -l 



= n n '(I* - * > n n (*^, 

;=0 fc=o i=0 fc=0 

> exp (cb log(s n /B) £ 2 l - c 6 ]T ^ > e ^(lo g ( £ „/B)- J) = e c 8 (^/s)-V(^) log( e „/fl) j 
V 1=0 1=0 J 

for n > N(a,5,B,tjj) and we have used that J ~ — log(e n /-B) in the last line. Using (j4.7j) 
and that h(Bo) > for some -Bq > H/ollc?' we nave that f° r ra > -^( a > <5> A)> V0> 



P(ll/o - ^||jf-« < £n) > / i ( J B )e C3log(e ™ /B()) e C9(e " /B(,) 1/(Q+a) i°g(^/^o) > e c w e n 1/(a+S) ] 



so that ([3.6p is satisfied by the choice e n ~ 



»+<5 



(4.8) 



Consider now the case 7 < S < 7 + - , where we can establish (|3.6p in a similar fashion by 
using an approximation argument. Recall that /o € C 7 ([0, 1]) and let h r be the best H~ a - 



approximation of /o such that < r. Write h r = #o^ + 2~^z^o Y^k=o ^Ik^lki where \8\ < 

and |0 {fc | < r 2- z ( 5 + 1 /2) i an d recall that the wavelet coefficients of f Q satisfy \/3 ik \ < £ 2~ i(7+1/2) 
ioi Bq > \ \fo\\cf Let l r be the smallest integer such that 2 lr ( s ~^ > r/Bo, so that in particular 
6lk = fiik for all I <l r . Then 

00 2 l — 1 00 _ 

||/o - h r \\ 2 H . a =J2H 2~ 2la m-e lk \ 2 <J22~ 21 ^ (fl - r2"^)) < Cijgsr*^, 

/ — Zj -1 A? — / — Jit 

so that I |/o — h r \\ H - a < C(fo)r by the definition of l r . Pick r n to be the smallest integer 
such that r n 5-7 < |g?, so that by the triangle inequality, 

P (ll/o - U s \\ H - a < e n ) > F (|K n - C^|| H -a < ce n ) 

for some 1/2 < c < 1. Since ||/tr„Hc 5 — r «> we use ^.8j) to obtain 

P(||/o - ^Hif-* < e n ) > h(r n )exp ^ci (j^j log > /i(r n )exp ^-c 2 r^~ 7 logr„^ . 

Since 5 < 7+ -, /i(r) > e~ Dr " for all r G N, and r n > C3£ n Q+7 for some C3 > and sufficiently 
large n, we obtain the lower bound exp I —d,2£ n a+1 log — I . Bounding this from below by 



r 



q+7 

e -Cne n yigijjg ^ e choice e n = ( -^p ) a 1 
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Consider now the bias condition (|3.5|) and, using the notation of wavelets, take k n = 
2 Jn ~ ne 2 . Let B r = supp(n <5 ' r ) denote the C s ([0, l])-ball of radius r. Let r n be an integer 
satisfying (Lins 2 l ) 1 / U < r n < (L 2 ne^) 1 ^ for some constants L\, L 2 and take V n = B Tn . Then 
H(Vn) = 1 — H(r n ) < e~ DVn < e~ LnSn , where L is a constant that can be made sufficiently 
large by increasing h\. Now for all functions / & B r , sup fc \Pik(f)\ < r2~^ <5+1 / 2 ) for all I > 0. 
Consequently, 

oo 2' — 1 oo 2 ; -l 

n^„(/) - f\\l = E E i^(/)i 2 <EE ^ (25+1) < ^ 2 2- 25J ", 

i=J n k=0 l=J n k=0 

so that for all f eV n , 

\\Kj n (f) - /ll 2 < C'inel) 1 /"- 6 < C'% = C"'e n (ne 2 n T, 

ct + 5-l/v 

which is verified with the choice e n = n 2«+2«-2/i/+i _ Comparing this rate to the rates obtained 
when verifying (|3,6p we obtain the minimal choices e n = n 2a + 2<5 - 2 /"+ 1 when 5 < 7 + ^ and 

c* + 7 -, 

e n = (log n/n) 2^+27+1 when 6 = 7 + ~. For the true function /o 6 C 7 ([0, 1]), using a standard 
approximation bound gives ||iO„(/o) — /o 1 1 2 — C(/o)2 _7Jn ~ (ne 2 ) -7 = 0(£ n ) for all the 
above choices of e n . For these two cases, apply Theorem 13.11 to obtain contraction rate 

Consider now the stronger tail condition 1 — H(r) < exp (— e Dr ") as r — > 00 for some 

ct+6 

v > 0. When 5 < 7, (13. 6f) is satisfied as above by the choice e ~ (^r^) «+ 2 <5+i ^ Luting 

r n be an integer satisfying (log(Line 2 < r n < (k^Z^ne^)) 1 /" for some constants L\, 
L2 and taking V n as above we obtain n('P^) < exp (—e Dr ™) < e~ LnSn for some constant 
L that can be made arbitrarily large by increasing Li. Using the above bias calculations, 
\\KjM) ~ /ll 2 < Cr n 2- 5Jn < C'r n (ne 2 n )- & and so setting this equal to £ n = n a e 2 n a+l yields 

l/v a + S 

that (|3.5p is satisfied by the choice e n = (log n) 2^+2,5+1 n 2^+2,5+1 . Substituting this expression 
into that of £ n gives the desired contraction rate. □ 
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